Sometimes two dimensional visuals are not enough. There is a lot more to the data that can be used to contextualize latent patterns. Often times, many analysts tend to think in two-dimensions – like scatter plots. But there’s more to it. Let’s say we were provided a nice clean set of data that contains the following:

What can you do with that data? Well, turns out that that these quantities are related. [3 lines of description go here]

How did we get to this?

#Set working directory
      setwd("/Users/sigmamonstr/Google Drive/DOC/0_Project Tracking/Commerce Academy/Storytelling_with_R/ACS_14")
      
#Load in data
      load("base_file.Rda")
      head(data)
##   state_fips region  region_name             id  id2
## 1         01      3 South Region 0500000US01117 1117
## 2         01      3 South Region 0500000US01115 1115
## 3         01      3 South Region 0500000US01057 1057
## 4         01      3 South Region 0500000US01129 1129
## 5         01      3 South Region 0500000US01049 1049
## 6         01      3 South Region 0500000US01055 1055
##                    geography pct_poverty emp_status households hs_grad
## 1     Shelby County, Alabama         8.6        6.2      74790    91.3
## 2  St. Clair County, Alabama        16.1        9.5      31673    82.4
## 3    Fayette County, Alabama        20.8       11.3       6967    76.1
## 4 Washington County, Alabama        15.9       19.8       6218    82.0
## 5     DeKalb County, Alabama        20.1        9.5      24743    73.2
## 6     Etowah County, Alabama        19.6       10.7      40001    82.1
#Load in Threejs library      
      library(threejs)

We can see that there are direct relationships between unemployment, poverty and education attainment. But there isn’t much detail and the graphs aren’t pretty.

        scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad)

Let’s stylize the plots. First let’s name the axes with axisLabels, which accepts a vector of axis names. The order matters and is as follows: x-axis, z-axis, y-axis

      #Note that axis Labels should follow this order= c(x, z, y)
        scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad,   
                     axisLabels=c("unemployment","hs degree or above","poverty rate"))    

Now let’s change the rendering engine to give more depth to the plot. We do so by changing renderer = “canvas”. This just tells R threejs to use a different package to render the points

      #Depth using render
        scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad, 
                       axisLabels=c("unemployment","hs degree or above","poverty rate"),
                       renderer="canvas")   

Now, let’s set the color of the points, resize the points, and flip the y axis so it’s ascending from the origin. To do so, we: - set col = “slategrey” - set flip.y = FALSE - set size = 0.5

      #Point size, color, don't flip y axis
        scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad, 
                       axisLabels=c("unemployment","hs degree or above","poverty rate"),
                       renderer="canvas",  flip.y=FALSE, col="slategrey",
                       size=0.5)   

Ultimately, we want to find more patterns. By using color, we can group regions by color. We can see some regions are worse off than others. But which? Turns out there are 4 regions:

   unique(data$region_name)     
## [1] South Region     West Region      Northeast Region Midwest Region  
## Levels: Midwest Region Northeast Region South Region West Region
    unique(data$region)  
## [1] 3 4 1 2

First, let’s set each region to a different color by first creating a new variable for colors data$colors, then assign a hexcode to each region.

      #Set up colors by 
        data$colors <- ""
        data$colors[data$region==1] <- "#011efe0"
        data$colors[data$region==2] <- "#0bff01"
        data$colors[data$region==3] <- "#fe00f6"
        data$colors[data$region==4] <- "#fdfe02"

Now, let’s set col= data$colors so that R knows which color corresponds to each of the 3000 points.

      data <- data[order(data$region),]
      #Grouped patterns
        scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad, 
                       axisLabels=c("unemployment","hs degree or above","poverty rate"),
                       col=data$colors,  flip.y=FALSE, 
                       renderer="canvas", 
                       size=0.5)   

It’s a bit annoying to look at the chart without knowing which point corresponds to which county. Let’s add labels for each point that show up upon mousing over.

      #add labels to points
      scatterplot3js(data$emp_status, data$pct_poverty,data$hs_grad,   
                     axisLabels=c("unemployment","hs degree or above","poverty rate"),
                     col=data$colors,
                     labels=paste(data$region_name,": ",data$geography), 
                     size=0.5,
                    renderer="canvas")

In short, we can tell the following key insights from this graph.

Part 2: Maps

Sometimes graphs don’t get the point across. Maps, while over used, can provide some better indication of patterns.

Based on our 3-d graphs, we could see clustering of regions’s economic performance. We can see the mess of points more clearly on a map. Observations:

## OGR data source with driver: ESRI Shapefile 
## Source: "cb_2014_us_county_20m.shp", layer: "cb_2014_us_county_20m"
## with 3220 features
## It has 9 fields

Getting started

We can use the leaflet library to bring a geographic spin to the data. To initiate a map, we only need to open the leaflet library, then run the following:

      library(leaflet)
      leaflet()  

You’ll see that the map is blank with a zoom control panel on the upper left. That’s because the map doesn’t have data in it. There are dozens on free layers we can use:

        leaflet() %>%
        addProviderTiles("Stamen.Toner") 
        leaflet() %>%
        addProviderTiles("CartoDB.Positron") 

Now let’s center and zoom in on the contiguous US

        leaflet() %>%
        addProviderTiles("CartoDB.Positron") %>%
        setView(lng = -98.3, lat = 39.5, zoom = 4) 

We now data. Get shapefile. (diagram of shapes goes here)

      shape_direct <- function(url, shp) {
        library(rgdal)
        temp = tempfile()
        download.file(url, temp) ##download the URL taret to the temp file
        unzip(temp,exdir=getwd()) ##unzip that file
        return(readOGR(paste(shp,".shp",sep=""),shp))
      }
      
      shp <- shape_direct(url="http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_us_county_20m.zip",
                          shp= "cb_2014_us_county_20m")
## OGR data source with driver: ESRI Shapefile 
## Source: "cb_2014_us_county_20m.shp", layer: "cb_2014_us_county_20m"
## with 3220 features
## It has 9 fields
## Warning in readOGR(paste(shp, ".shp", sep = ""), shp): Z-dimension
## discarded

Add shapefile to code

        leaflet(data=shp) %>%
        addProviderTiles("CartoDB.Positron") %>%
        setView(lng = -98.3, lat = 39.5, zoom = 4) %>%
        addPolygons(fillColor = "blue", 
                    fillOpacity = 0.8, 
                    color = "white", 
                    weight = 0.5)     

The shapefile on its own doesn’t have the data from the scatter chart portion. We need to join the data.

   data$GEOID <- str_pad(as.character(data$id2), 5, pad = "0")
      shp@data$GEOID <- as.character(shp@data$GEOID)
      shp <- merge(shp,data,id="GEOID")
      
      
      pal <- colorQuantile("YlGn", NULL, n = 30)
      
      state_popup <- paste0("<strong>County: </strong>", 
                            shp@data$geography, 
                            "<br><strong>Poverty Rate (%): </strong>", 
                            shp@data$pct_poverty)
      
      leaflet(data = shp) %>%
        addProviderTiles("CartoDB.Positron") %>%
        setView(lng = -98.3, lat = 39.5, zoom = 4) %>%
        addPolygons(fillColor = ~pal(pct_poverty), 
                    fillOpacity = 0.8, 
                    color = "#BDBDC3", 
                    weight = 0.1, 
                    popup = state_popup)